Introduction

This script is used to analyze siRNA screen validation data obtained with High-Content Image analysis. In particular, the siRNA library used is the Dharmacon generated by Andria Schibler and Pedja Jevtic, based on the primary siRNA screen results using the Epigenetics and Custom Nuclear Envelope libraries from ThermoFisher.

The siRNA library was originally received dried in 96 well plates at 0.25 nM. It was resuspended and mixed in 50 ul nuclease-free water to obtain a final concentration of 5 uM, frozen overnight and then thawed to increase oligo siRNA solubility.45 ul were aspirated from the original source plate and dispensed into separate wells to generate a 384 well master Plate. These siRNAs occupied columns 1 to 22 (Partial). All these liquid handling operations were performed using a PerkinElmer Janus instrument, which output all the liquid handling operations logs as text files.384 well imaging ready plates containing spotted siRNA (300 nl) were generated using an Echo525. 300 of ul control siRNA (5uM) were then added to their respective wells in columns 22, 23 and 24. The imaging assay ready plates were sealed with aluminum sealing foil and stored at -20oC until use.

Imaging Assay plates were dried, frozen and then used in reverse transfection experiments. For reverese transfection, plates were thawed, spinned and 20 ul of Optimem + 0.05 ul/well of RNAiMax were added to the plates and incubated for 30’. Then 20 ul of cells were added on top of the siRNA/RNAiMax mix and incubated for 72 hrs.

Fixed and stained plates were imaged on an Opera QEHS microscope using a 40X water immersion objectives. Images were analyzed in Columbus 2.8.1, and image analysis results were exported as tab delimited .txt files.

The script reads library reformatting files used on the Echo (Containing the siRNA layout), the Columbus image analysis results and generates statistical analysis.

Ab markers:

Load packages.

library(plyr)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.1     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.1     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     ── Conflicts ────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::arrange()   masks plyr::arrange()
✖ purrr::compact()   masks plyr::compact()
✖ dplyr::count()     masks plyr::count()
✖ dplyr::desc()      masks plyr::desc()
✖ dplyr::failwith()  masks plyr::failwith()
✖ dplyr::filter()    masks stats::filter()
✖ dplyr::id()        masks plyr::id()
✖ dplyr::lag()       masks stats::lag()
✖ dplyr::mutate()    masks plyr::mutate()
✖ dplyr::rename()    masks plyr::rename()
✖ dplyr::summarise() masks plyr::summarise()
✖ dplyr::summarize() masks plyr::summarize()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(knitr)
library(data.table)
Registered S3 method overwritten by 'data.table':
  method           from
  print.data.table     
data.table 1.14.8 using 1 threads (see ?getDTthreads).  Latest news: r-datatable.com
**********
This installation of data.table has not detected OpenMP support. It should still work but in single-threaded mode.
This is a Mac. Please read https://mac.r-project.org/openmp/. Please engage with Apple and ask them for support. Check r-datatable.com for updates, and our Mac instructions here: https://github.com/Rdatatable/data.table/wiki/Installation. After several years of many reports of installation problems on Mac, it's time to gingerly point out that there have been no similar problems on Windows or Linux.
**********

Attaching package: ‘data.table’

The following objects are masked from ‘package:lubridate’:

    hour, isoweek, mday, minute, month, quarter, second, wday, week, yday,
    year

The following objects are masked from ‘package:dplyr’:

    between, first, last

The following object is masked from ‘package:purrr’:

    transpose
library(ggthemes)
library(viridis)
Loading required package: viridisLite

Read the siRNA layouts and generate a control layout

Set channel variable names.

green_name = "LMNB"
red_name = "LMNA"

Read the siRNA layout information provided by Dharmacon and select only relevant columns. The custom cherry pick work list file that was generated by the Janus to reformat the siRNA oligos from 96 well to 384.

dt_siRNA <- fread(input = "Cherry_Pick_Worklist.csv")

setnames(dt_siRNA, c("dest_col", 
                  "dest_row",
                  "dest_well"), 
                c("Column",
                  "Row",
                  "WellName"))

glimpse(dt_siRNA)
Rows: 340
Columns: 17
$ .id            <chr> "LP_48199 G-CUSTOM-366598.csv", "LP_48199 G-CUSTOM-…
$ source_rack    <chr> "Plate 1", "Plate 1", "Plate 1", "Plate 1", "Plate …
$ source_well    <chr> "A02", "B02", "C02", "D02", "E02", "F02", "G02", "H…
$ oligo_id       <chr> "D-009329-01", "D-003477-18", "D-011653-01", "D-007…
$ gene_symbol    <chr> "ACAA1", "CREBBP", "EYA3", "SLC22A18", "RPA3", "TP5…
$ gene_id        <int> 30, 1387, 2140, 5002, 6119, 7158, 8520, 9631, 30, 1…
$ gene_accession <chr> "NM_001607", "NM_001079846", "NM_001990", "NM_00255…
$ gi_number      <int> 6598316, 119943101, 26667242, 34734074, 52851430, 5…
$ sequence       <chr> "GAGAUUGCCUGAUUCCUAU", "UCACAGAGAUCCAGGGCGA", "GAUU…
$ source_pos     <int> 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, …
$ source_col     <int> 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, …
$ source_row     <int> 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, …
$ source_row2    <chr> "A", "B", "C", "D", "E", "F", "G", "H", "A", "B", "…
$ dest_pos       <int> 84, 72, 340, 225, 89, 66, 87, 221, 254, 127, 30, 44…
$ Column         <int> 6, 5, 22, 15, 6, 5, 6, 14, 16, 8, 2, 3, 11, 18, 1, …
$ Row            <int> 4, 8, 4, 1, 9, 2, 7, 13, 14, 15, 14, 12, 8, 14, 5, …
$ WellName       <chr> "D6", "H5", "D22", "A15", "I6", "B5", "G6", "M14", …

Create a control layout data.table that contains the positions of the library (sample), empty wells (empty), and controls (negative, positive1 or LMNB1, positive2 or SYNE2, positive3 or LMNA, and killer).

dt_control <- data.table(Column = rep(1:24, each = 16), Row = rep(1:16, 24))
dt_control[, WellName := paste0(LETTERS[Row], Column)]
dt_control[Column %in% 1:21, treatment := "sample"]
dt_control[Column == 22 & Row %in% 1:4, treatment := "sample"]
dt_control[Column == 22 & Row %in% 5:12, treatment := "LMNA"]
dt_control[Column == 22 & Row %in% 13:16, treatment := "empty"]
dt_control[Column %in% 23 & Row %in% seq(1, 16, 2), treatment := "negative"]
dt_control[Column %in% 23 & Row %in% seq(2, 16, 2), treatment := "killer"]
dt_control[Column %in% 24 & Row %in% seq(1, 16, 2), treatment := "LMNB1"]
dt_control[Column %in% 24 & Row %in% seq(2, 16, 2), treatment := "SYNE2"]

glimpse(dt_control)
Rows: 384
Columns: 4
$ Column    <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,…
$ Row       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
$ WellName  <chr> "A1", "B1", "C1", "D1", "E1", "F1", "G1", "H1", "I1", "J…
$ treatment <chr> "sample", "sample", "sample", "sample", "sample", "sampl…

Merge the control layout and the siRNA layout data.tables.

dt_layout <- dt_siRNA[dt_control, on = c("Column", "Row")]

dt_layout

Read and process the Columbus data

Set RegEx patterns for directory searches for logs file data and spot data on a per protocol step basis.

pat_col <- "*.result\\.1\\.txt"# Pattern for Columbus results files

Create a list of the RegEx patterns set in the previous chunk. Important: the list names will be carried over all the next steps!!!

pat_list <- list(col = pat_col)

pat_list

Recursively search the working directory and its subdirectories for files whose name includes the RegEx patterns defined two chunks above. The path_list functon outputs absolute file names. path_list is a list containing all the filenames on a per Janus step basis.

list_files <- function(x){
  dir(path = "input", pattern = x, full.names = TRUE, recursive = TRUE, include.dirs = TRUE)
}

path_list <- llply(pat_list, list_files) 

path_list

Extract file names from absolute path and set them as list element names.

trim_names <- function(x){
  names(x) <- basename(x) # This assigns the filename to the file that it is read
  y <- x ## This is necessary because of scoping issues
}

path_list <- llply(path_list, trim_names) 

Recursively read and merge object level data files as data.frames. Rows are labeled with relative filenames (The .id variable). This and the previous chunks are slightly modified tricks adopted from H. Wickam “Tidy Data” paper.

read_merge <- function(x){
  dt <-as.data.table(ldply(x, fread, integer64 = "character")) 
}

dt_list <- llply(path_list, read_merge)

Separate Columbus data from the other classes of data.

dt_col <- dt_list$col

rm(dt_list)

glimpse(dt_col)
Rows: 768
Columns: 22
$ .id                                                                        <chr> …
$ ScreenName                                                                 <chr> …
$ ScreenID                                                                   <int> …
$ PlateName                                                                  <chr> …
$ PlateID                                                                    <int> …
$ MeasurementDate                                                            <dttm> …
$ MeasurementID                                                              <int> …
$ WellName                                                                   <chr> …
$ Row                                                                        <int> …
$ Column                                                                     <int> …
$ Timepoint                                                                  <int> …
$ Plane                                                                      <int> …
$ `Nuclei Selected - Number of Objects`                                      <int> …
$ `Nuclei Selected - Nucleus Area [µm²] - Mean per Well`                     <dbl> …
$ `Nuclei Selected - Nucleus Roundness - Mean per Well`                      <dbl> …
$ `Nuclei Selected - Nucleus Width [µm] - Mean per Well`                     <dbl> …
$ `Nuclei Selected - Nucleus Length [µm] - Mean per Well`                    <dbl> …
$ `Nuclei Selected - Nucleus Ratio Width to Length - Mean per Well`          <dbl> …
$ `Nuclei Selected - Intensity Nucleus Region Exp2Cam1 Mean - Mean per Well` <dbl> …
$ `Nuclei Selected - Intensity Nucleus Region Exp3Cam2 Mean - Mean per Well` <dbl> …
$ `Number of Analyzed Fields`                                                <int> …
$ Link                                                                       <chr> …

Rename variables

setnames(dt_col, c("Nuclei Selected - Number of Objects",
                   "Nuclei Selected - Nucleus Area [µm²] - Mean per Well",
                   "Nuclei Selected - Nucleus Roundness - Mean per Well",
                   "Nuclei Selected - Intensity Nucleus Region Exp2Cam1 Mean - Mean per Well",
                   "Nuclei Selected - Intensity Nucleus Region Exp3Cam2 Mean - Mean per Well"),
                 c("cell_number",
                   "nuc_area",
                   "nuc_roundness",
                   "nuc_green_int",
                   "nuc_red_int"))

glimpse(dt_col)
Rows: 768
Columns: 22
$ .id                                                               <chr> …
$ ScreenName                                                        <chr> …
$ ScreenID                                                          <int> …
$ PlateName                                                         <chr> …
$ PlateID                                                           <int> …
$ MeasurementDate                                                   <dttm> …
$ MeasurementID                                                     <int> …
$ WellName                                                          <chr> …
$ Row                                                               <int> …
$ Column                                                            <int> …
$ Timepoint                                                         <int> …
$ Plane                                                             <int> …
$ cell_number                                                       <int> …
$ nuc_area                                                          <dbl> …
$ nuc_roundness                                                     <dbl> …
$ `Nuclei Selected - Nucleus Width [µm] - Mean per Well`            <dbl> …
$ `Nuclei Selected - Nucleus Length [µm] - Mean per Well`           <dbl> …
$ `Nuclei Selected - Nucleus Ratio Width to Length - Mean per Well` <dbl> …
$ nuc_green_int                                                     <dbl> …
$ nuc_red_int                                                       <dbl> …
$ `Number of Analyzed Fields`                                       <int> …
$ Link                                                              <chr> …

Merge the Columbus measurements data with the layout data.

dt_data <- dt_col[dt_layout, on = c("Column", "Row")]

glimpse(dt_data)
Rows: 768
Columns: 39
$ .id                                                               <chr> …
$ ScreenName                                                        <chr> …
$ ScreenID                                                          <int> …
$ PlateName                                                         <chr> …
$ PlateID                                                           <int> …
$ MeasurementDate                                                   <dttm> …
$ MeasurementID                                                     <int> …
$ WellName                                                          <chr> …
$ Row                                                               <int> …
$ Column                                                            <int> …
$ Timepoint                                                         <int> …
$ Plane                                                             <int> …
$ cell_number                                                       <int> …
$ nuc_area                                                          <dbl> …
$ nuc_roundness                                                     <dbl> …
$ `Nuclei Selected - Nucleus Width [µm] - Mean per Well`            <dbl> …
$ `Nuclei Selected - Nucleus Length [µm] - Mean per Well`           <dbl> …
$ `Nuclei Selected - Nucleus Ratio Width to Length - Mean per Well` <dbl> …
$ nuc_green_int                                                     <dbl> …
$ nuc_red_int                                                       <dbl> …
$ `Number of Analyzed Fields`                                       <int> …
$ Link                                                              <chr> …
$ i..id                                                             <chr> …
$ source_rack                                                       <chr> …
$ source_well                                                       <chr> …
$ oligo_id                                                          <chr> …
$ gene_symbol                                                       <chr> …
$ gene_id                                                           <int> …
$ gene_accession                                                    <chr> …
$ gi_number                                                         <int> …
$ sequence                                                          <chr> …
$ source_pos                                                        <int> …
$ source_col                                                        <int> …
$ source_row                                                        <int> …
$ source_row2                                                       <chr> …
$ dest_pos                                                          <int> …
$ i.WellName                                                        <chr> …
$ i.WellName.1                                                      <chr> …
$ treatment                                                         <chr> …

Plot the data

Quite a few wells in HTIF00183 clearly had an issue, either due to the sample itself, or to the autofocus. For this reason, these wells will be eliminated from the analysis.


excluded <- c("M7", "N7", "O7", "P7", "O8", "P8")

dt_data <- dt_data[!(PlateName == "HTIF00183" & WellName %in% excluded),]

Calculate the mean and standard devitation (SD) for the negative control siRNA on a per plate basis.

dt_norm <-
    dt_data[treatment == "negative", .(
    neg_n_cells_mean = mean(cell_number, na.rm = TRUE),
    neg_n_cells_sd = sd(cell_number, na.rm = TRUE),
    neg_area_mean = mean(nuc_area, na.rm = TRUE),
    neg_area_sd = sd(nuc_area, na.rm = TRUE),
    neg_round_mean = mean(nuc_roundness, na.rm = TRUE),
    neg_round_sd = sd(nuc_roundness, na.rm = TRUE),
    neg_nuc_red_mean = mean(nuc_red_int, na.rm = TRUE),
    neg_nuc_red_sd = sd(nuc_red_int, na.rm = TRUE),
    neg_nuc_green_mean = mean(nuc_green_int, na.rm = TRUE),
    neg_nuc_green_sd = sd(nuc_green_int, na.rm = TRUE)
    ), by = .id]

glimpse(dt_norm)
Rows: 2
Columns: 11
$ .id                <chr> "Shape_Size_Intensity_hEpigenetics_Val[159522].…
$ neg_n_cells_mean   <dbl> 403.125, 308.375
$ neg_n_cells_sd     <dbl> 48.40141, 29.04645
$ neg_area_mean      <dbl> 192.2003, 192.9557
$ neg_area_sd        <dbl> 6.202427, 4.306389
$ neg_round_mean     <dbl> 0.9634322, 0.9740030
$ neg_round_sd       <dbl> 0.004896193, 0.002212123
$ neg_nuc_red_mean   <dbl> 251.4881, 309.4958
$ neg_nuc_red_sd     <dbl> 37.80239, 15.59650
$ neg_nuc_green_mean <dbl> 1189.074, 1411.724
$ neg_nuc_green_sd   <dbl> 257.6809, 142.6345

Calculate Z-scores based on the mean and SD of the negative controls. Also calculate the negative control normalized values (On a per plate basis).

dt_data <- dt_data[dt_norm, on = ".id"]

z_score <- function(measurement, average, s_dev){
           return((measurement - average)/s_dev)
}

norm_change <-function(measurement, average){
            return(100*(measurement/average))
}

dt_data[, `:=`(n_cells_z_score = z_score(cell_number, neg_n_cells_mean, neg_n_cells_sd),
               n_cells_norm_change = norm_change(cell_number, neg_n_cells_mean),
               area_z_score = z_score(nuc_area, neg_area_mean, neg_area_sd),
               area_norm_change = norm_change(nuc_area, neg_area_mean),
               round_z_score = z_score(nuc_roundness, neg_round_mean, neg_round_sd),
               round_norm_change = norm_change(nuc_roundness, neg_round_mean),
               nuc_red_z_score = z_score(nuc_red_int, neg_nuc_red_mean, neg_nuc_red_sd),
               nuc_red_norm_change = norm_change(nuc_red_int, neg_nuc_red_mean),
               nuc_green_z_score = z_score(nuc_green_int, neg_nuc_green_mean, neg_nuc_green_sd),
               nuc_green_norm_change = norm_change(nuc_green_int, neg_nuc_green_mean))]

glimpse(dt_data)
Rows: 762
Columns: 59
$ .id                                                               <chr> …
$ ScreenName                                                        <chr> …
$ ScreenID                                                          <int> …
$ PlateName                                                         <chr> …
$ PlateID                                                           <int> …
$ MeasurementDate                                                   <dttm> …
$ MeasurementID                                                     <int> …
$ WellName                                                          <chr> …
$ Row                                                               <int> …
$ Column                                                            <int> …
$ Timepoint                                                         <int> …
$ Plane                                                             <int> …
$ cell_number                                                       <int> …
$ nuc_area                                                          <dbl> …
$ nuc_roundness                                                     <dbl> …
$ `Nuclei Selected - Nucleus Width [µm] - Mean per Well`            <dbl> …
$ `Nuclei Selected - Nucleus Length [µm] - Mean per Well`           <dbl> …
$ `Nuclei Selected - Nucleus Ratio Width to Length - Mean per Well` <dbl> …
$ nuc_green_int                                                     <dbl> …
$ nuc_red_int                                                       <dbl> …
$ `Number of Analyzed Fields`                                       <int> …
$ Link                                                              <chr> …
$ i..id                                                             <chr> …
$ source_rack                                                       <chr> …
$ source_well                                                       <chr> …
$ oligo_id                                                          <chr> …
$ gene_symbol                                                       <chr> …
$ gene_id                                                           <int> …
$ gene_accession                                                    <chr> …
$ gi_number                                                         <int> …
$ sequence                                                          <chr> …
$ source_pos                                                        <int> …
$ source_col                                                        <int> …
$ source_row                                                        <int> …
$ source_row2                                                       <chr> …
$ dest_pos                                                          <int> …
$ i.WellName                                                        <chr> …
$ i.WellName.1                                                      <chr> …
$ treatment                                                         <chr> …
$ neg_n_cells_mean                                                  <dbl> …
$ neg_n_cells_sd                                                    <dbl> …
$ neg_area_mean                                                     <dbl> …
$ neg_area_sd                                                       <dbl> …
$ neg_round_mean                                                    <dbl> …
$ neg_round_sd                                                      <dbl> …
$ neg_nuc_red_mean                                                  <dbl> …
$ neg_nuc_red_sd                                                    <dbl> …
$ neg_nuc_green_mean                                                <dbl> …
$ neg_nuc_green_sd                                                  <dbl> …
$ n_cells_z_score                                                   <dbl> …
$ n_cells_norm_change                                               <dbl> …
$ area_z_score                                                      <dbl> …
$ area_norm_change                                                  <dbl> …
$ round_z_score                                                     <dbl> …
$ round_norm_change                                                 <dbl> …
$ nuc_red_z_score                                                   <dbl> …
$ nuc_red_norm_change                                               <dbl> …
$ nuc_green_z_score                                                 <dbl> …
$ nuc_green_norm_change                                             <dbl> …

Write the data to a .csv file for further analysis

write.table(dt_data, 
            paste("output/validation", red_name, green_name, "normalized_results.txt", sep = "_"),
            quote = FALSE,
            sep = "\t",
            row.names = FALSE, 
            col.names = TRUE)

Create a novel unique identifier for each oligo that contains also the gene symbol.

dt_data[,oligo_name := paste(gene_symbol, str_extract(oligo_id, "[0-9]{2}$"), sep = "-")]
ℹ The package "Hmisc" is required.
✖ Would you like to install it?

1: Yes
2: No

Enter an item from the menu, or 0 to exit
trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.2/checkmate_2.1.0.tgz'
Content type 'application/x-gzip' length 702523 bytes (686 KB)
==================================================
downloaded 686 KB

trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.2/htmlwidgets_1.6.2.tgz'
Content type 'application/x-gzip' length 803214 bytes (784 KB)
==================================================
downloaded 784 KB

trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.2/htmlTable_2.4.1.tgz'
Content type 'application/x-gzip' length 419727 bytes (409 KB)
==================================================
downloaded 409 KB

trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.2/Formula_1.2-5.tgz'
Content type 'application/x-gzip' length 158153 bytes (154 KB)
==================================================
downloaded 154 KB

trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.2/Hmisc_5.0-1.tgz'
Content type 'application/x-gzip' length 3434580 bytes (3.3 MB)
==================================================
downloaded 3.3 MB

The downloaded binary packages are in
    /var/folders/lp/2jhntvmx27z1mlsyhx8xpmwhfr_62q/T//Rtmp86jG8C/downloaded_packages

Aggregate the biological repeats (The two different plates) by calculating the mean and SD for all the variables (Z-scores and fold changes). n = 2.

dt_aggregated <- dt_data %>% 
        group_by(gene_symbol,
                 gene_id,
                 oligo_id,
                 oligo_name,
                 sequence) %>%
        summarise(across(n_cells_z_score:nuc_green_norm_change, list(mean, sd))) %>%
        arrange(oligo_name)
`summarise()` has grouped output by 'gene_symbol', 'gene_id', 'oligo_id', 'oligo_name'. You can override using the `.groups` argument.
write.table(dt_aggregated, 
            paste("output/validation", red_name, green_name, "aggregated_results.txt", sep = "_"),
            quote = FALSE,
            sep = "\t",
            row.names = FALSE, 
            col.names = TRUE)

Document the information about the analysis session

sessionInfo()
---
title: "Secondary siRNA Screen Validation - LMNA/LMNB"
author: "Andria Schibler/Gianluca Pegoraro"
date: "March 27th 2023"
output:
  html_document: default
  html_notebook: default
---

### Introduction

This script is used to analyze siRNA screen validation data obtained with High-Content Image analysis. In particular, the siRNA library used is the Dharmacon generated by Andria Schibler and Pedja Jevtic, based on the primary siRNA screen results using the Epigenetics and Custom Nuclear Envelope libraries from ThermoFisher.

The siRNA library was originally received dried in 96 well plates at 0.25 nM. It was resuspended and mixed in 50 ul nuclease-free water to obtain a final concentration of 5 uM, frozen overnight and then thawed to increase oligo siRNA solubility.45 ul were aspirated from the original source plate and dispensed into separate wells to generate a 384 well master Plate. These siRNAs occupied columns 1 to 22 (Partial). All these liquid handling operations were performed using a PerkinElmer Janus instrument, which output all the liquid handling operations logs as text files.384 well imaging ready plates containing spotted siRNA (300 nl) were generated using an Echo525. 300 of ul control siRNA (5uM) were then added to their respective wells in columns 22, 23 and 24. The imaging assay ready plates were sealed with aluminum sealing foil and stored at -20oC until use.

Imaging Assay plates were dried, frozen and then used in reverse transfection experiments. For reverese transfection, plates were thawed, spinned and 20 ul of Optimem + 0.05 ul/well of RNAiMax were added to the plates and incubated for 30'. Then 20 ul of cells were added on top of the siRNA/RNAiMax mix and incubated for 72 hrs.

Fixed and stained plates were imaged on an Opera QEHS microscope using a 40X water immersion objectives. Images were analyzed in Columbus 2.8.1, and image analysis results were exported as tab delimited .txt files.

The script reads library reformatting files used on the Echo (Containing the siRNA layout), the Columbus image analysis results and generates statistical analysis.

### Ab markers:

-   Green: LMNB
-   Red: LMNA

### Load packages.

```{r, warning=FALSE}
library(plyr)
library(tidyverse)
library(knitr)
library(data.table)
library(ggthemes)
library(viridis)
```

```{r, include=FALSE, warning=FALSE}
opts_chunk$set(
    fig.path = "output/",
    cache = FALSE,
    fig.width = 15,
    fig.height = 7,
    message = FALSE,
    warning = FALSE
    )
```

```{r setThemeandPalette, include='false'}
theme_set(theme_minimal())
theme_update(axis.text.x=element_text(angle = -90, hjust = 0, vjust = 0.5))
```

### Read the siRNA layouts and generate a control layout

Set channel variable names.

```{r}
green_name = "LMNB"
red_name = "LMNA"
```

Read the siRNA layout information provided by Dharmacon and select only relevant columns. The custom cherry pick work list file that was generated by the Janus to reformat the siRNA oligos from 96 well to 384.

```{r geneListRead}
dt_siRNA <- fread(input = "Cherry_Pick_Worklist.csv")

setnames(dt_siRNA, c("dest_col", 
                  "dest_row",
                  "dest_well"), 
                c("Column",
                  "Row",
                  "WellName"))

glimpse(dt_siRNA)
```

Create a control layout data.table that contains the positions of the library (`sample`), empty wells (`empty`), and controls (`negative`, `positive1` or LMNB1, `positive2` or SYNE2, `positive3` or LMNA, and `killer`).

```{r controlLayout}
dt_control <- data.table(Column = rep(1:24, each = 16), Row = rep(1:16, 24))
dt_control[, WellName := paste0(LETTERS[Row], Column)]
dt_control[Column %in% 1:21, treatment := "sample"]
dt_control[Column == 22 & Row %in% 1:4, treatment := "sample"]
dt_control[Column == 22 & Row %in% 5:12, treatment := "LMNA"]
dt_control[Column == 22 & Row %in% 13:16, treatment := "empty"]
dt_control[Column %in% 23 & Row %in% seq(1, 16, 2), treatment := "negative"]
dt_control[Column %in% 23 & Row %in% seq(2, 16, 2), treatment := "killer"]
dt_control[Column %in% 24 & Row %in% seq(1, 16, 2), treatment := "LMNB1"]
dt_control[Column %in% 24 & Row %in% seq(2, 16, 2), treatment := "SYNE2"]

glimpse(dt_control)
```

Merge the control layout and the siRNA layout data.tables.

```{r siRNAControlMerge}
dt_layout <- dt_siRNA[dt_control, on = c("Column", "Row")]

dt_layout
```

### Read and process the Columbus data

Set RegEx patterns for directory searches for logs file data and spot data on a per protocol step basis.

```{r regexFilename}
pat_col <- "*.result\\.1\\.txt"# Pattern for Columbus results files
```

Create a list of the RegEx patterns set in the previous chunk. **Important:** the list names will be carried over all the next steps!!!

-   col = Columbus data

```{r patList, results = 'hide'}
pat_list <- list(col = pat_col)

pat_list
```

Recursively search the working directory and its subdirectories for files whose name includes the RegEx patterns defined two chunks above. The `path_list` functon outputs absolute file names. `path_list` is a list containing all the filenames on a per Janus step basis.

```{r directorySearch, results='hide'}
list_files <- function(x){
  dir(path = "input", pattern = x, full.names = TRUE, recursive = TRUE, include.dirs = TRUE)
}

path_list <- llply(pat_list, list_files) 

path_list
```

Extract file names from absolute path and set them as list element names.

```{r trimNames, results='hide'}
trim_names <- function(x){
  names(x) <- basename(x) # This assigns the filename to the file that it is read
  y <- x ## This is necessary because of scoping issues
}

path_list <- llply(path_list, trim_names) 
```

Recursively read and merge object level data files as data.frames. Rows are labeled with relative filenames (The `.id` variable). This and the previous chunks are slightly modified tricks adopted from H. Wickam ["Tidy Data" paper](http://vita.had.co.nz/papers/tidy-data.pdf).

```{r readMerge, results='hide', warning=F}
read_merge <- function(x){
  dt <-as.data.table(ldply(x, fread, integer64 = "character")) 
}

dt_list <- llply(path_list, read_merge)
```

Separate Columbus data from the other classes of data.

```{r separateColumbusData}
dt_col <- dt_list$col

rm(dt_list)

glimpse(dt_col)
```

Rename variables

```{r}
setnames(dt_col, c("Nuclei Selected - Number of Objects",
                   "Nuclei Selected - Nucleus Area [µm²] - Mean per Well",
                   "Nuclei Selected - Nucleus Roundness - Mean per Well",
                   "Nuclei Selected - Intensity Nucleus Region Exp2Cam1 Mean - Mean per Well",
                   "Nuclei Selected - Intensity Nucleus Region Exp3Cam2 Mean - Mean per Well"),
                 c("cell_number",
                   "nuc_area",
                   "nuc_roundness",
                   "nuc_green_int",
                   "nuc_red_int"))

glimpse(dt_col)
```

Merge the Columbus measurements data with the layout data.

```{r resultsLayoutMerge}
dt_data <- dt_col[dt_layout, on = c("Column", "Row")]

glimpse(dt_data)
```

# Plot the data

```{r layoutPlot, echo=FALSE}
layout_plot <- ggplot(dt_data, aes(x = Column, y = Row, fill = treatment))

layout_plot + geom_tile(color = "black") +
    scale_y_reverse(breaks = 1:16, labels = LETTERS[1:16]) +
    scale_x_continuous(breaks = 1:24) +
    scale_fill_tableau(name = "siRNA") +
    ggtitle("siRNA Layout")
```

```{r n_cellsHeatmap, fig.height = 8, fig.width = 8, echo=FALSE}
n_cells_heatmap_plot <- ggplot(dt_data, aes(x = Column, y = Row, fill = cell_number))

n_cells_heatmap_plot + geom_tile(color = "black") +
    scale_y_reverse(breaks = 1:16, labels = LETTERS[1:16]) +
    scale_x_continuous(breaks = 1:24) +
    scale_fill_viridis("Cell Number") +
    facet_wrap(~ PlateName, ncol = 1) +
    ggtitle("Cells Number")
```

```{r areaHeatmap, fig.height = 8, fig.width = 8, echo=FALSE}
area_heatmap_plot <- ggplot(dt_data, aes(x = Column, y = Row, fill = nuc_area))

area_heatmap_plot + geom_tile(color = "black") +
    scale_y_reverse(breaks = 1:16, labels = LETTERS[1:16]) +
    scale_x_continuous(breaks = 1:24) +
    scale_fill_viridis("Area [µm²]") +
    facet_wrap(~ PlateName, ncol = 1) +
    ggtitle("Raw Area Measurement")
```

```{r roundHeatmap, fig.height = 8, fig.width = 8,, echo=FALSE}
round_heatmap_plot <- ggplot(dt_data, aes(x = Column, y = Row, fill = nuc_roundness))

round_heatmap_plot + geom_tile(color = "black") +
    scale_y_reverse(breaks = 1:16, labels = LETTERS[1:16]) +
    scale_x_continuous(breaks = 1:24) +
    scale_fill_viridis("Roundness") +
    facet_wrap(~ PlateName, ncol = 1) +
    ggtitle("Raw Roundness Measurement")
```

```{r nucRedHeatmap, fig.height = 8, fig.width = 8, echo=FALSE}
nuc_red_heatmap_plot <- ggplot(dt_data, aes(x = Column, y = Row, fill = nuc_red_int))

nuc_red_heatmap_plot + geom_tile(color = "black") +
    scale_y_reverse(breaks = 1:16, labels = LETTERS[1:16]) +
    scale_x_continuous(breaks = 1:24) +
    scale_fill_viridis(paste(red_name,"Nuclear Intensity (A.U.)")) +
    facet_wrap(~ PlateName, ncol = 1) +
    ggtitle(paste(red_name, "Nuclear Intensity"))
```

```{r nucgreenHeatmap, fig.height = 8, fig.width = 8, echo=FALSE}
nuc_green_heatmap_plot <- ggplot(dt_data, aes(x = Column, y = Row, fill = nuc_green_int))

nuc_green_heatmap_plot + geom_tile(color = "black") +
    scale_y_reverse(breaks = 1:16, labels = LETTERS[1:16]) +
    scale_x_continuous(breaks = 1:24) +
    scale_fill_viridis(paste(green_name,"Nuclear Intensity (A.U.)")) +
    facet_wrap(~ PlateName, ncol = 1) +
    ggtitle(paste(green_name, "Nuclear Intensity"))
```

Quite a few wells in HTIF00183 clearly had an issue, either due to the sample itself, or to the autofocus. For this reason, **these wells will be eliminated from the analysis**.

```{r}

excluded <- c("M7", "N7", "O7", "P7", "O8", "P8")

dt_data <- dt_data[!(PlateName == "HTIF00183" & WellName %in% excluded),]
```

Calculate the mean and standard devitation (SD) for the negative control siRNA on a per plate basis.

```{r}
dt_norm <-
    dt_data[treatment == "negative", .(
    neg_n_cells_mean = mean(cell_number, na.rm = TRUE),
    neg_n_cells_sd = sd(cell_number, na.rm = TRUE),
    neg_area_mean = mean(nuc_area, na.rm = TRUE),
    neg_area_sd = sd(nuc_area, na.rm = TRUE),
    neg_round_mean = mean(nuc_roundness, na.rm = TRUE),
    neg_round_sd = sd(nuc_roundness, na.rm = TRUE),
    neg_nuc_red_mean = mean(nuc_red_int, na.rm = TRUE),
    neg_nuc_red_sd = sd(nuc_red_int, na.rm = TRUE),
    neg_nuc_green_mean = mean(nuc_green_int, na.rm = TRUE),
    neg_nuc_green_sd = sd(nuc_green_int, na.rm = TRUE)
    ), by = .id]

glimpse(dt_norm)
```

Calculate Z-scores based on the mean and SD of the negative controls. Also calculate the negative control normalized values (On a per plate basis).

```{r}
dt_data <- dt_data[dt_norm, on = ".id"]

z_score <- function(measurement, average, s_dev){
           return((measurement - average)/s_dev)
}

norm_change <-function(measurement, average){
            return(100*(measurement/average))
}

dt_data[, `:=`(n_cells_z_score = z_score(cell_number, neg_n_cells_mean, neg_n_cells_sd),
               n_cells_norm_change = norm_change(cell_number, neg_n_cells_mean),
               area_z_score = z_score(nuc_area, neg_area_mean, neg_area_sd),
               area_norm_change = norm_change(nuc_area, neg_area_mean),
               round_z_score = z_score(nuc_roundness, neg_round_mean, neg_round_sd),
               round_norm_change = norm_change(nuc_roundness, neg_round_mean),
               nuc_red_z_score = z_score(nuc_red_int, neg_nuc_red_mean, neg_nuc_red_sd),
               nuc_red_norm_change = norm_change(nuc_red_int, neg_nuc_red_mean),
               nuc_green_z_score = z_score(nuc_green_int, neg_nuc_green_mean, neg_nuc_green_sd),
               nuc_green_norm_change = norm_change(nuc_green_int, neg_nuc_green_mean))]

glimpse(dt_data)
```

Write the data to a .csv file for further analysis

```{r}
write.table(dt_data, 
            paste("output/validation", red_name, green_name, "normalized_results.txt", sep = "_"),
            quote = FALSE,
            sep = "\t",
            row.names = FALSE, 
            col.names = TRUE)
```

Create a novel unique identifier for each oligo that contains also the gene symbol.

```{r}
dt_data[,oligo_name := paste(gene_symbol, str_extract(oligo_id, "[0-9]{2}$"), sep = "-")]
```

```{r n_cellsZPlot, fig.height=8, fig.width=25, echo=FALSE}
theme_update(axis.text.x=element_text(size = 6))

n_cells_plot_z<- ggplot(dt_data[!(is.na(oligo_id)),], aes(x = oligo_name, y = n_cells_z_score))

n_cells_plot_z + stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)) + 
            geom_hline(yintercept = 0) +
            
            xlab("siRNA id") +
            ylab("Z-score +/- SD") +
            ggtitle("Cell Number")
```

```{r n_cellsNormPlot, fig.height=8, fig.width=25, echo=FALSE}
n_cells_plot_fold <- ggplot(dt_data[!(is.na(oligo_id)),], aes(x = oligo_name, y = n_cells_norm_change))

n_cells_plot_fold + stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)) + 
            geom_hline(yintercept = 100) +
            
            xlab("siRNA id") +
            ylab("% Negative Control +/- SD") +
            ggtitle("Cell Number")
```

```{r areaZPlot, fig.height=8, fig.width=25, echo=FALSE}
area_plot_z<- ggplot(dt_data[!(is.na(oligo_id)),], aes(x = oligo_name, y = area_z_score))

area_plot_z + stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)) + 
            geom_hline(yintercept = 0) +
            
            xlab("siRNA id") +
            ylab("Z-score +/- SD") +
            ggtitle("Nuclear Area")
```

```{r areaNormPlot, fig.height=8, fig.width=25, echo=FALSE}
area_plot_fold <- ggplot(dt_data[!(is.na(oligo_id)),], aes(x = oligo_name, y = area_norm_change))

area_plot_fold + stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)) + 
            geom_hline(yintercept = 100) +
            
            xlab("siRNA id") +
            ylab("% Negative Control +/- SD") +
            ggtitle("Nuclear Area")
```

```{r roundZPlot, fig.height=8, fig.width=25, echo=FALSE}
round_plot_z<- ggplot(dt_data[!(is.na(oligo_id)),], aes(x = oligo_name, y = round_z_score))

round_plot_z + stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)) + 
            geom_hline(yintercept = 0) +
            
            xlab("siRNA id") +
            ylab("Z-score +/- SD") +
            ggtitle("Nuclear Roundness")
```

```{r roundNormPlot, fig.height=8, fig.width=25, echo=FALSE}
round_plot_fold <- ggplot(dt_data[!(is.na(oligo_id)),], aes(x = oligo_name, y = round_norm_change))

round_plot_fold + stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)) + 
            geom_hline(yintercept = 100) +
            
            xlab("siRNA id") +
            ylab("% Negative Control +/- SD") +
            ggtitle("Nuclear Roundness")
```

```{r nucRedZPlot, fig.height=8, fig.width=25, echo=FALSE}
nuc_red_plot_z<- ggplot(dt_data[!(is.na(oligo_id)),], aes(x = oligo_name, y = nuc_red_z_score))

nuc_red_plot_z + stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)) + 
            geom_hline(yintercept = 0) +
            
            xlab("siRNA id") +
            ylab("Z-score +/- SD") +
            ggtitle(paste(red_name, "Nuclear Intensity"))
```

```{r nucRedNormPlot, fig.height=8, fig.width=25, echo=FALSE}
nuc_red_plot_fold <- ggplot(dt_data[!(is.na(oligo_id)),], aes(x = oligo_name, y = nuc_red_norm_change))

nuc_red_plot_fold + stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)) + 
            geom_hline(yintercept = 100) +
            
            xlab("siRNA id") +
            ylab("% Negative Control +/- SD") +
            ggtitle(paste(red_name, "Nuclear Intensity"))
```

```{r nucgreenZPlot, fig.height=8, fig.width=25, echo=FALSE}
nuc_green_plot_z<- ggplot(dt_data[!(is.na(oligo_id)),], aes(x = oligo_name, y = nuc_green_z_score))

nuc_green_plot_z + stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)) + 
            geom_hline(yintercept = 0) +
            
            xlab("siRNA id") +
            ylab("Z-score +/- SD") +
            ggtitle(paste(green_name, "Nuclear Intensity"))
```

```{r nucgreenNormPlot, fig.height=8, fig.width=25, echo=FALSE}
nuc_green_plot_fold <- ggplot(dt_data[!(is.na(oligo_id)),], aes(x = oligo_name, y = nuc_green_norm_change))

nuc_green_plot_fold + stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)) + 
            geom_hline(yintercept = 100) +
            
            xlab("siRNA id") +
            ylab("% Negative Control +/- SD") +
            ggtitle(paste(green_name, "Nuclear Intensity"))
```

Aggregate the biological repeats (The two different plates) by calculating the mean and SD for all the variables (Z-scores and fold changes). **n = 2**.

```{r}
dt_aggregated <- dt_data %>% 
        group_by(gene_symbol,
                 gene_id,
                 oligo_id,
                 oligo_name,
                 sequence) %>%
        summarise(across(n_cells_z_score:nuc_green_norm_change, list(mean, sd))) %>%
        arrange(oligo_name)

write.table(dt_aggregated, 
            paste("output/validation", red_name, green_name, "aggregated_results.txt", sep = "_"),
            quote = FALSE,
            sep = "\t",
            row.names = FALSE, 
            col.names = TRUE)
```

Document the information about the analysis session

```{r sessionInfo, include=TRUE, echo=TRUE, results='markup'}
sessionInfo()
```
